BioLite, a Lightweight Bioinformatics Framework with Automated Tracking of Diagnostics and Provenance
نویسندگان
چکیده
We present a new Python/C++ framework, BioLite, for implementing bioinformatics pipelines for NextGeneration Sequencing (NGS) data. BioLite tracks provenance of analyses, automates the collection and reporting of diagnostics (such as summary statistics and plots at intermediate stages), and profiles computational requirements. These diagnostics can be accessed across multiple stages of a pipeline, from other pipelines, and in HTML reports. Finally, we describe several use cases for diagnostics in our own analyses.
منابع مشابه
Mapping the NRC Dataflow Model to the Open Provenance Model
The Open Provenance Model (OPM) has recently been proposed as an exchange framework for workflow provenance information. In this paper we show how the NRC data model for workflow repositories can be mapped to the OPM. Our mapping includes such features as complex data flow in an execution of a workflow; different workflows in the repository that call each other; and the tracking of subvalues of...
متن کاملSemantic Representation of Provenance in Wikipedia
Wikis are often considered as being a wide source of information. However, identifying provenance information about their content is crucial, whether it is for computing trust in public wiki pages or to identify experts in corporate wikis. In this paper, we address this issue by providing a lightweight ontology for provenance management in wikis, based on the W7 model. Furthermore, we showcase ...
متن کاملBioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments
Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computationand data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this w...
متن کاملWrangling Galaxy’s reference data
UNLABELLED The Galaxy platform has developed into a fully featured collaborative workbench, with goals of inherently capturing provenance to enable reproducible data analysis, and of making it straightforward to run one's own server. However, many Galaxy platform tools rely on the presence of reference data, such as alignment indexes, to function efficiently. Until now, the building of this cac...
متن کاملTracking Provenance in ORNL’s Flexible Research Platforms
Provenance is defined as information about the origin of objects, a concept that applies to both physical and digital objects and often overlaps both. The use of provenance in systems designed for research is an important but forgotten feature. Provenance allows for proper and exact tracking of information, its use, its lineage, its derivations and other metadata that are important for correctl...
متن کامل